One year with an iPSC/860

نویسنده

  • Eric Barszcz
چکیده

This paper describes experiences over the past year with an the Intel iPSC/860, a distributed memory MIMD parallel computer based on the Intel i860 oat-ing point processor. The system at NASA Ames Research Center has 128 nodes, and a theoretical peak performance of over seven GFLOPS. This paper describes the system at Ames Research Center, talks about system stability, compiler performance measured by the NAS kernels, and results from a two-dimensional computational uid dynamics application. In spring of 1989 DARPA and Intel Scientiic Computers announced the \Touchstone" project. This project calls for the development of a series of prototype machines by Intel Scientiic Computers, based on hardware and software technologies being developed by Intel in collaboration with research teams at CalTech, Illinois. One of the milestones is the \Gamma" prototype. On December 29, 1989, the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center took delivery of one of the rst two Intel Touchstone Gamma prototypes. The system is marketed commercially as the Intel iPSC/860 and will be referred to as the iPSC/860 for the remainder of the paper. For a review of early experiences with the iPSC/860 at Ames Research Center and Oak Ridge National Laboratory, see 1] and 5] respectively. The iPSC/860 system is based on the 64 bit i860 microprocessor by Intel 6]. The i860 runs at 40 MHZ (the initial system was delivered with 33 MHZ processors , which were upgraded to 40 MHZ). The theoretical peak speed is 80 MFLOPS for 32 bit oating point and 60 MFLOPS for 64 bit oating point operations. There are thirty-two 32-bit integer address registers, and sixteen 64-bit oating point registers (which may be used as thirty-two 32-bit oating point registers). Floating point register 0 is hardware wired to zero. This implies there are only fteen 64-bit oating point registers that can hold non-zero values. The i860 also has an 8 kilobyte data cache and a 4 kilobyte instruction cache on-chip. The data path between cache and registers is 128 bits wide. The data path between main memory and registers is a 64 bits wide. The i860 has a number of features to facilitate high execution rates. First of all, a number of important operations, including oating point add, multiply, and loads from main memory, can be pipelined. When pipelined oating point operations are used, they are segmented into three stages, and a new operation can be …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Eecient Parallel Algorithm for Solving N-nephron Models of the Renal Inner Medulla 1

A parallel algorithm for solving the multinephron model of the renal inner medulla is developed. The intrinsic nature of this problem supplies suucient symmetry for a high-level parallelism on distributed-memory parallel machines such as the iPSC/860, Paragon, and CM-5. Parallelization makes it feasible to study interesting models such as the rat kidney with 30; 000 nephrons. On a high-end work...

متن کامل

Porting Ncube Packages to iPSC/860 and iPSC/2 Hypercubes

This report collects the experiences in porting the Sparse Matrix Solver of the Pellpack from Ncube!2 to Intel hypercubes. The differences between these machines are discussed and some important machine dependent issues are raised and solved. A Ncube/2 library has been installed on the Intel iPSC/2 and iPSC/860. Steps to run Ncube/2 code on the Intel hypercubes are listed for both the specific ...

متن کامل

A Performance Assessment of Express on the iPSC/2 and iPSC/860 Hypercube Computers

Abstract This paper describes the performance evaluation of Express programming environment on the iPSC/2 and iPSC/860 hypercube computers. Express allows parallel programs to be developed in a completely portable fashion and is available on most commercially available parallel computers as well as networks of workstations. We have developed a set of benchmarks to make a comprehensive performan...

متن کامل

Performance Experiments and Optimizations of PDE Sparse Solvers on Hypercubes,

In this report we present the results of experiments with the parallel sparse matrix solver of the Parallel Ellpack System. 1bree different hypercube parallel machines are used to compare and optimize its performance. After a brief description of the parnIlel sparse matrix solver and a presentation of the machine parameters and features. the measurements of performance of the sparse solver on t...

متن کامل

Out of Core FFTs in a Parallel Application Environment

A principle mission at the NAS facility is to establish highly parallel computer systems supporting full scale production use by 1996. In order to fulfill this objective, parallel systems must support high speed scalable I/O -suitable for handling output of large scale numerical aerodynamic simulation. Pursuant to this goal, we seek to execute an 'out of core' radix 2 Fast Fourier Transform (FF...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1991